From Word-spotting to Oov Modeling
نویسنده
چکیده
This paper explores one dimension along which word spotting and speech recognition differ: the nature of the background model. In word spotting, a relatively small number of keywords float on a sea of unknown words. In speech recognition, an occasional unknown word punctuates utterances that are otherwise completely invocabulary. Despite this difference in viewpoint, in some circumstances implementations of the two may become very similar. When transcribed data is available for a domain, word spotting benefits from the more detailed background model this can support [9]. The manner in which the background is modeled in these cases is reminiscent of speech recognition. For example, a large vocabulary with good coverage may be extracted from the corpus, so that relatively few words in an utterance remain unmodeled. In this case, the situation is qualitatively similar to OOV modeling in a conventional speech recognizer, except that the vocabulary is strictly divided into “filler” and “keyword”.
منابع مشابه
Cross-word sub-word units for low-resource keyword spotting
We investigate the use of sub-word lexical units for the detection of out-of-vocabulary (OOV) keywords in the keyword spotting task. Sub-word units based on morphological decomposition and character ngrams are compared. In particular, we examine the benefit of sub-word units that cross word boundaries. Experiments are performed on the IARPA Babel Turkish dataset. Our results demonstrate that cr...
متن کاملA new approach for modeling OOV words
This paper addressed the problem of Out-Of-Vocabulary (OOV) utterance detection in small vocabulary telephone keyword spotting system. We propose a new approach for modeling OOV words in the scenario of a small vocabulary of telephone keyword spotting system. The paper adopt the semi-continuous Hidden Markov Model with multiple codebooks to modeling the keywords. We propose a two pass procedure...
متن کاملOut-of-Vocabulary Word Modeling and Rejection for Spanish Keyword Spotting Systems
This paper presents a combination of out-of-vocabulary (OOV) word modeling and rejection techniques in an attempt to accept utterances embedding a keyword and reject utterances with nonkeywords. The goal of this research is to develop a robust, task-independent Spanish keyword spotter and to develop a method for optimizing confidence thresholds for a particular context. To model OOV words, we e...
متن کاملImproving utterance verification using hierarchical confidence measures in continuous natural numbers recognition
Utterance Verification (UV) is a critical function of an Automatic Speech Recognition (ASR) System working on real applications where spontaneous speech, out-ofvocabulary (OOV) words and acoustic noises are present. In this paper we present a new UV procedure with two major features: a) Confidence tests are applied to decoded string hypotheses obtained from using word and garbage models that re...
متن کاملSpoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کامل